Survival Analysis and Kaplan-Meier plotting functions
Prepares a comprehensive dataset for survival analysis using the counting process approach. Computes risk sets, event counts, Kaplan-Meier estimates, log-rank (Fleming-Harrington (\(\rho,\gamma\)) weights) and Cox model results, and quantile estimates for two groups (e.g., treatment and control), optionally handling (subject-specific case) weights (such as inverse probability weights) and stratification.
Time-dependent weights for log-rank statistics are implemented via wt.rg.S function: Supporting a variety of commonly used and custom weighting schemes for weighted log-rank and related tests. The function is flexible and can be used for Fleming-Harrington, Schemper, Xu & O’Quigley (XO), Maggir-Burman (MB), custom time-based, and exponential variants of Fleming-Harrington weights.
Key Features: - Calculates weights for use in weighted
log-rank and related survival tests. - Supports standard
Fleming-Harrington weights (scheme = "fh"). - Implements
Schemper weights (scheme = "schemper"), XO weights
(scheme = "XO"), MB weights (scheme = "MB"). -
Allows for custom time-based weights
(scheme = "custom_time"). - Supports exponential variants
of Fleming-Harrington weights (scheme = "fh_exp1",
scheme = "fh_exp2").
Typical Inputs: - S: Kaplan-Meier survival
curve (vector). - scheme: Weighting scheme (e.g.,
"fh", "schemper", "XO",
"MB", "custom_time", "fh_exp1",
"fh_exp2"). - rho, gamma:
Weighting parameters (for "fh"). - Scensor:
(Optional) Censoring distribution (for
"schemper"). - Ybar: (Optional) Risk
set sizes (for "XO"). - tpoints,
t.tau, w0.tau, w1.tau:
(Optional) For custom time-based weights. -
mb_tstar: (Optional) For MB weights. -
details: (Optional) Logical; whether to return
detailed output.
Typical Outputs: - A vector of weights corresponding to the
input time points (or a list with details if
details = TRUE).
Main Steps: 1. Determines the weighting scheme based on the
scheme argument. 2. Computes weights using the chosen
formula and input survival/censoring curves. 3. Handles special cases
for Schemper, XO, MB, custom time-based, and exponential FH weights. 4.
Returns the computed weight vector (or details if requested).
Creates Kaplan-Meier survival plots for two groups (e.g., treatment vs. control), allowing for the use of weights (such as inverse probability weights) in the estimation. The function can display survival curves, confidence intervals, risk tables, and optionally subgroup curves or additional annotations.
KM_diff compares Kaplan-Meier survival curves between two groups (typically treatment vs. control) using time-to-event data. It calculates survival estimates, their differences, and provides both pointwise and simultaneous confidence intervals. The function can also perform resampling to construct simultaneous confidence bands.
Allows for arbitrary weights (non-negative) to implement (for example) propensity-score adjustment (IPW).
Plots the difference (via KM_diff calculations) in Kaplan-Meier survival curves between two groups (e.g., treatment vs. control), optionally including simultaneous confidence bands and subgroup curves. The function also displays risk tables for the overall population and specified subgroups.
Computes the weighted log-rank test statistic, its variance, the
difference in survival between two groups at a specified time point
(tzero), the variance of this difference, their covariance,
and correlation. It supports flexible time-dependent weighting schemes
via the wt.rg.S function.
library(tinytex)
library(survival)
# to install weightedKMplots
# library(devtools)
# github_install("larry-leon/weightedKMplots")
library(weightedKMplots)
# source revised functions (to be updated in weightedKMplots)
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/df_counting_improved.R")
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/km_calculations_helpers.R")
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/kmplotting_helpers.R")
#survminer is only used for 1 plot comparison
library(survminer)
—- Data Preparation —- Prepare GBSG data
df_gbsg <- gbsg
df_gbsg$time_months <- df_gbsg$rfstime / 30.4375
tte.name <- "time_months"
event.name <- "status"
treat.name <- "hormon"
arms <- c("treat", "control")
—- Main Analyses —-
GBSG - ITT Analysis
dfcount_gbsg <- df_counting(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk = 12, scheme = "fh", lr.digits = 4, cox.digits =3)
—- Plotting —- GBSG KM plots
par(mfrow=c(1,2))
# Mine
# Set ymax a little above 1.0 to allow for logrank placement in topleft
# topleft is default; xmed.fraction = 0.65 positions the display of the median estimates below the HR estimates ("topright" legend [default])
# For details, please see summary of xmed.fraction in the Appendix below.
plot_weighted_km(dfcount=dfcount_gbsg, conf.int=FALSE, show.logrank = TRUE, put.legend.lr = "topleft", ymax = 1.05, xmed.fraction = 0.65)
title(main="GBSG (trial) data un-weighted")
# Compare with survfit
plot_km(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name)
title(main="GBSG (trial) data un-weighted
via survfit")
Compare with survminer
plot_weighted_km(dfcount=dfcount_gbsg, conf.int=TRUE, show.logrank = TRUE, ymax = 1.05)
title(main="GBSG (trial) data un-weighted")
km_fit <- survfit(Surv(time_months,status) ~ hormon, data=df_gbsg)
ggsurvplot(km_fit,conf.int=TRUE, risk.table = TRUE, break.time.by=12, xlim=c(0,86),
tables.height = 0.2, tables.theme = theme_cleantable(), censor=TRUE, main = "GBSG (trial) data un-weighted")
—- Propensity Score Weighting (Rotterdam) —-
Prepare Rotterdam data
gbsg_validate <- within(rotterdam, {
rfstime <- ifelse(recur == 1, rtime, dtime)
t_months <- rfstime / 30.4375
time_months <- t_months
status <- pmax(recur, death)
ignore <- (recur == 0 & death == 1 & rtime < dtime)
status2 <- ifelse(recur == 1 | ignore, recur, death)
rfstime2 <- ifelse(recur == 1 | ignore, rtime, dtime)
time_months2 <- rfstime2 / 30.4375
grade3 <- ifelse(grade == "3", 1, 0)
treat <- hormon
id <- as.numeric(1:nrow(rotterdam))
SG0 <- ifelse(er <= 0, 0, 1)
})
Node positive only to correspond to GBSG population
df_rotterdam <- subset(gbsg_validate, nodes > 0)
Propensity score model
fit.ps <- glm(treat ~ age + meno + size + grade3 + nodes + pgr + chemo + er, data=df_rotterdam, family="binomial")
pihat <- fit.ps$fitted
pihat.null <- glm(treat ~ 1, family="binomial", data=df_rotterdam)$fitted
wt.1 <- pihat.null / pihat
wt.0 <- (1 - pihat.null) / (1 - pihat)
df_rotterdam$sw.weights <- ifelse(df_rotterdam$treat == 1, wt.1, wt.0)
# truncated weights
df_rotterdam$sw.weights_trunc <- with(df_rotterdam, pmin(pmax(sw.weights, quantile(sw.weights, 0.05)), quantile(sw.weights, 0.95)))
Un-weighted and weighted analyses
dfcount_rotterdam_unwtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24)
dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24,
weight.name="sw.weights")
—- Plotting Weighted vs Unweighted —-
par(mfrow=c(1,2))
plot_weighted_km(dfcount=dfcount_rotterdam_unwtd, xmed.fraction = 0.65)
title(main="Rotterdam un-weighted K-M curves")
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, xmed.fraction = 0.65)
title(main="Rotterdam weighted K-M curves")
Compare with GBSG trial data
par(mfrow=c(1,2))
plot_weighted_km(dfcount=dfcount_gbsg, conf.int = TRUE, xmed.fraction = 0.65, show.logrank = TRUE, ymax = 1.05)
title(main="GBSG (trial) data un-weighted K-M curves")
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, conf.int = TRUE, xmed.fraction = 0.65, show.logrank = TRUE, ymax = 1.05)
title(main="Rotterdam weighted K-M curves")
Note that the first graph displays 20 (1st 20) resampled approximations to the centered survival difference
\(\Delta(t) = (\widehat{S_1} - \widehat{S_0})(t) - (S_{1} - S_{0})(t)),\) for timepoints \(t\) within “qtau” quartiles of the event times (i.e., for qtau = 2.5% we calculate \(\Delta\) for time points within the 2.5% and 97.5% quantiles).
—- Simultaneous CI bands —-
par(mfrow=c(1,2))
temp <- plotKM.band_subgroups(
df=df_gbsg,
xlabel="Months", ylabel="Survival difference",
yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
tau_add=42, by.risk=6, risk_delta=0.075, risk.pad=0.03,
tte.name = "time_months", treat.name = "hormon", event.name = "status", draws.band = 5000, qtau = 0.025, show_resamples = TRUE
)
legend("topleft", c("Difference estimate", "95% pointwise CIs"),
lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
title(main="ITT population")
Subgroup differences along with ITT simultaneous band:
Note that er=0 subgroup was identified as questionable benefit.
—- Subgroup Band Plot —-
par(mfrow=c(1,1))
sg_labs <- c("er == 0","er > 0", "meno == 0", "meno ==1")
sg_cols <- c("blue", "brown", "green", "turquoise")
# Randomly sample colors from the built-in palette
# n_sg <- length(sg_labs)
# set.seed(123) # for reproducibility
# sg_cols <- sample(colors(), n_sg)
temp <- plotKM.band_subgroups(
df=df_gbsg,
sg_labels = sg_labs,
sg_colors = sg_cols,
xlabel="Months", ylabel="Survival difference",
yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
tau_add=42, by.risk=6, risk_delta=0.05, risk.pad=0.03, draws.band = 5000,
tte.name = "time_months", treat.name = "hormon", event.name = "status"
)
legend("topleft", c("ITT", sg_labs),
lwd=2, col=c("black", sg_cols), bty="n", cex=0.75)
title(main="ITT and subgroups")
Look at er>0 subgroup population
—- Simultaneous CI bands —-
par(mfrow=c(1,2))
temp <- plotKM.band_subgroups(
df = subset(df_gbsg, er > 0),
xlabel="Months", ylabel="Survival difference",
yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
tau_add=42, by.risk=6, risk_delta=0.075, risk.pad=0.03,
tte.name = "time_months", treat.name = "hormon", event.name = "status", draws.band = 5000, qtau = 0.025, show_resamples = TRUE
)
legend("topleft", c("Difference estimate", "95% pointwise CIs"),
lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
title(main="Estrogen receptor positive (er>0) sub-population")
Rotterdam data example with propensity score weighting
—- Simultaneous CI bands —-
par(mfrow=c(1,2))
temp <- plotKM.band_subgroups(
df=df_rotterdam,
xlabel="Months", ylabel="Survival difference",
yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
tau_add=42, by.risk = 12, risk_delta=0.075, risk.pad=0.03,
tte.name = "time_months", treat.name = "hormon", event.name = "status", weight.name = "sw.weights",
draws.band = 5000, qtau = 0.025, show_resamples = TRUE
)
legend("topleft", c("Difference estimate", "95% pointwise CIs"),
lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
title(main="ITT population weighted")
dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24,
weight.name="sw.weights", check.seKM = TRUE, draws = 0)
Note that draws = 5000 does not take much time, but can be reduced to around 500 for sufficient accuracy.
dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24,
weight.name="sw.weights", check.seKM = TRUE, draws = 5000)
# Compare with adjustedCurves
library(adjustedCurves)
library(pammtools)
##
## Attaching package: 'pammtools'
## The following object is masked from 'package:stats':
##
## filter
df_rotterdam$hormon2 <- with(df_rotterdam, factor(hormon, labels=c("No","Yes")))
ps_model <- glm(hormon2 ~ age+meno+size+grade+nodes+pgr+chemo+er, data = df_rotterdam, family="binomial")
iptw <- adjustedsurv(data=df_rotterdam, variable="hormon2", ev_time="time_months", event = "status", method = "iptw_km", treatment_model = ps_model, conf_int = TRUE)
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, conf.int = TRUE)
title(main="Rotterdam weighted K-M curves")
plot(iptw, conf_int = TRUE, legend.title = "Hormonal therapy")
# Check SEs
par(mfrow=c(1,2))
df_mine <- dfcount_rotterdam_wtd
df_check <- subset(iptw$adj, group == "No")
yymax <- max(c(sqrt(df_mine$sig2_surv0[df_mine$idv0.check]), df_check$se))
toget <- df_mine$idv0.check
tt <- df_mine$at.points[toget]
yy <- sqrt(df_mine$sig2_surv0)[toget]
plot(tt,yy, type="s", lty=1, col="lightgrey", lwd=4, ylim=c(0,yymax), xlab="time", ylab="SEs")
with(df_check, lines(time, se, type="s", lty=2, lwd=1, col="red"))
title(main="Control SEs")
df_check <- subset(iptw$adj, group == "Yes")
yymax <- max(c(sqrt(df_mine$sig2_surv1[df_mine$idv1.check]), df_check$se))
toget <- df_mine$idv1.check
tt <- df_mine$at.points[toget]
yy <- sqrt(df_mine$sig2_surv1)[toget]
plot(tt,yy, type="s", lty=1, col="lightgrey", lwd=4, ylim=c(0,yymax), xlab="time", ylab="SEs")
with(df_check, lines(time, se, type="s", lty=2, lwd=1, col="red"))
title(main = "Treat SEs")
# dfcount_gbsg contains logrank (chisq) stats from specified scheme (default is standard logrank rho=0 and gamma=0) from 3 sources:
# (1) wlr_estimates function
# (2) survdiff which only provides rho options so gamma !=0 is not available
# (3) z_score_calculations function (from a Cox score-test perspective)
check_results(dfcount_gbsg)
## zlr_sq=8.564781, logrank=8.564781, zCox_sq=8.564781
# Check wlr_dhat_estimates which calculates from any scheme based on counting dataset
temp <- wlr_dhat_estimates(dfcounting = dfcount_gbsg, rho = 0, gamma = 0, scheme = "fh")
with(temp,lr^2/sig2_lr)
## [1] 8.564781
res <- df_counting(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, strata.name="grade")
stratified
with(res,lr_stratified^2/sig2_lr_stratified)
## [1] 7.395797
# survdiff stratified
res$logrank_results$chisq
## [1] 7.395797
# Cox score stratified
with(res,z.score_stratified^2)
## [1] 7.395797
non-stratified (should be same as above results without stratification)
with(res,lr^2/sig2_lr)
## [1] 8.564781
with(res,z.score^2)
## [1] 8.564781
Add duplicate subject (pid==51) for testing how plots mark ties
df_add <- subset(df_gbsg, pid == 51)
df_add$status <- 1.0
df_mod <- rbind(df_gbsg, df_add)
dfcount_mod <- get_dfcounting(df=df_mod, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms)
check_results(dfcount_mod)
## zlr_sq=8.551190, logrank=8.551190, zCox_sq=8.551190
For modified data
par(mfrow=c(1,2))
plot_weighted_km(dfcount_mod)
plot_km(df=df_mod, tte.name=tte.name, event.name=event.name, treat.name=treat.name)
A set of functions for advanced Kaplan-Meier survival analysis, including estimation, event/risk matrix construction, resampling for confidence bands, and group comparison.
Prepares a comprehensive dataset for survival analysis using the counting process approach. Computes risk sets, event counts, Kaplan-Meier estimates, log-rank and Cox model results, and quantile estimates for two groups (e.g., treatment and control), optionally handling weights and stratification.
Key Features: - Calculates risk set sizes and event counts at specified time points. - Computes weighted and unweighted Kaplan-Meier survival estimates for each group. - Provides log-rank and Cox model test results (including hazard ratios and p-values). - Estimates quantiles (e.g., median survival) and their confidence intervals. - Supports stratification and weighted analyses. - Performs internal checks for consistency and validity of KM curves and quantile calculations.
Typical Inputs: - df: Data frame
containing survival data. - tte.name: Name of the
time-to-event column. - event.name: Name of the event
indicator column (0/1). - treat.name: Name of the treatment
group column (0/1). - weight.name: (Optional) Name
of the weights column. - strata.name: (Optional)
Name of the stratification column. - arms: Character vector
of group labels. - Additional parameters for risk table intervals,
quantile estimation, and statistical tests.
Typical Outputs: - A list containing: - Risk set sizes and event counts for each group and time point. - Kaplan-Meier survival estimates and variances for each group. - Log-rank and Cox model results (test statistics, p-values, hazard ratios). - Quantile estimates (e.g., median survival) and confidence intervals. - Indices and times for events and censoring. - (If stratified) Results for each stratum.
Main Steps: 1. Validates and prepares input data. 2. Computes risk sets and event counts for each group and time point. 3. Calculates KM survival estimates and variances. 4. Performs log-rank and Cox model tests. 5. Estimates quantiles and confidence intervals. 6. Checks for consistency in KM curves and quantile calculations. 7. Returns a comprehensive list of results for downstream analysis and plotting.
Creates Kaplan-Meier survival plots for two groups (e.g., treatment vs. control), allowing for the use of weights (such as inverse probability weights) in the estimation. The function can display survival curves, confidence intervals, risk tables, and optionally subgroup curves or additional annotations.
This function calls KM_plot_2sample_weighted_counting
described below.
Key Features: - Plots weighted Kaplan-Meier survival curves for two groups. - Supports display of confidence intervals and risk tables. - Can handle weighted data (e.g., for marginal or adjusted survival). - Allows customization of plot appearance (colors, labels, axis limits, etc.). - May support subgroup overlays or additional plot annotations (depending on implementation).
Typical Inputs: - df: Data frame
containing survival data. - tte.name: Name of the
time-to-event column. - event.name: Name of the event
indicator column (0/1). - treat.name: Name of the treatment
group column (0/1). - weight.name: (Optional) Name
of the weights column. - Additional graphical and control parameters
(e.g., colors, axis labels, risk table options).
Typical Outputs: - A Kaplan-Meier plot with survival curves for each group, possibly with confidence intervals and risk tables. - (Optionally) Returns an object with survival estimates, time points, and other relevant results.
Main Steps: 1. Validates and prepares input data. 2. Computes weighted Kaplan-Meier survival estimates for each group. 3. Plots survival curves, confidence intervals, and risk tables. 4. Adds optional annotations or subgroup curves if specified.
The KM_plot_2sample_weighted_counting function plots
Kaplan-Meier survival curves for two groups (e.g., treatment and
control) using weighted counting data, with options for displaying risk
tables, legends, median survival, and statistical results (Cox model,
log-rank test).
Key Features: - Input validation for required data fields - Extraction of survival, censoring, and risk set data for both groups - Optional checks for valid KM curves - Plotting of KM curves with customizable appearance - Optional risk table below the plot - Legends for arms, Cox model, and log-rank test - Optional annotation of median survival - Extensive customization options
Parameters:
xmed.fraction ParameterPurpose: Controls the horizontal position of the median survival annotation (or other quantile annotation) on the Kaplan-Meier plot.
How it works: - Numeric value between 0 and 1. -
Specifies the fraction of the x-axis (time axis) at which to place the
median annotation text. - xmed.fraction = 0 places the
annotation at the far left of the plot. - xmed.fraction = 1
places it at the far right. - Typical value like
xmed.fraction = 0.80 places the annotation at 80% of the
way from left to right.
Example: If your time axis goes from 0 to 24 months
and xmed.fraction = 0.80, the annotation will be placed at
19.2 months on the x-axis.
Usage in the function: Allows fine-tuning of the median (or quantile) label position to avoid overlap with curves or other plot elements.
dfcount: List with all necessary datashow.cox, show.logrank: Show Cox/log-rank
resultsshow_arm_legend: Show arm legendarms: Names of the two groupsput.legend.arms, put.legend.cox,
put.legend.lr: Legend positionscol.0, col.1: Colors for the two
groupsltys, lwds: Line types and widthsshow.med: Show median survival annotationWorkflow: 1. Validate input 2. Extract data for both groups 3. Check KM curves (optional) 4. Plot KM curves 5. Add risk table (optional) 6. Add legends (arms, Cox, log-rank) 7. Annotate median survival (optional)
In summary, this function is a comprehensive tool for visualizing and annotating weighted Kaplan-Meier survival curves for two groups, with flexible options for statistical annotation and plot customization.
Compares Kaplan-Meier survival curves between two groups (e.g., treatment vs. control) using time-to-event data. Calculates survival estimates, their differences, and provides both pointwise and simultaneous confidence intervals. Optionally, performs resampling for simultaneous confidence bands.
Usage:
KM_estimates(ybar, nbar, sig2w_multiplier = NULL)
Arguments:
- ybar: Vector of risk set sizes at each time point. -
nbar: Vector of event counts at each time point. -
sig2w_multiplier: Optional vector for variance
calculation.
Value:
A list with survival estimates (S_KM) and their variances
(sig2_KM).
Description:
Constructs matrices indicating event and risk status for each subject at
specified time points.
Usage:
get_event_risk_matrices(U, at.points)
Arguments:
- U: Vector of observed times. - at.points:
Vector of time points for evaluation.
Value:
A list with event and risk matrices.
Description:
Performs resampling to generate survival curves for a group, used for
constructing confidence bands.
Usage:
resampling_survival(U, W, D, at.points, draws.band, surv, G_draws)
Arguments:
- U: Vector of observed times. - W: Vector of
weights. - D: Vector of event indicators. -
at.points: Time points for evaluation. -
draws.band: Number of resampling draws. -
surv: Survival estimates. - G_draws: Matrix of
random draws.
Value:
Matrix of resampled survival curves.
Description:
Compares Kaplan-Meier survival curves between two groups (e.g.,
treatment vs. control) using time-to-event data. Calculates survival
estimates, their differences, and provides both pointwise and
simultaneous confidence intervals. Optionally, performs resampling for
simultaneous confidence bands.
Usage:
KM_diff(
df, tte.name, event.name, treat.name, weight.name = NULL,
at.points = sort(df[[tte.name]]), alpha = 0.05, seedstart = 8316951,
draws = 0, risk.points, draws.band = 0, tau.seq = 0.25, qtau = 0.025,
show_resamples = TRUE
)
Arguments:
- See above for full list.
Value:
A list containing survival estimates, variances, differences, pointwise
and simultaneous confidence intervals, and (optionally) resampled
curves.
Workflow Integration:
- Use get_event_risk_matrices to construct event/risk
matrices for your data. - Use KM_estimates to compute
survival estimates and variances. - Use resampling_survival
for confidence band resampling. - Use KM_diff to compare
groups and summarize results.
Example Workflow:
# Prepare data
event_risk <- get_event_risk_matrices(U = mydata$time, at.points = seq(0, max(mydata$time), by = 1))
km_results <- KM_estimates(ybar = event_risk$risk_mat, nbar = event_risk$event_mat)
resampled <- resampling_survival(U = mydata$time, W = rep(1, nrow(mydata)), D = mydata$status,
at.points = seq(0, max(mydata$time), by = 1), draws.band = 1000,
surv = km_results$S_KM, G_draws = matrix(rnorm(1000 * nrow(mydata)), ncol = 1000))
km_diff_results <- KM_diff(df = mydata, tte.name = "time", event.name = "status", treat.name = "group",
risk.points = seq(0, max(mydata$time), by = 1), draws.band = 1000)
Description:
Plots the difference in Kaplan-Meier survival curves between two groups
(e.g., treatment vs. control), optionally including simultaneous
confidence bands and subgroup curves. The function also displays risk
tables for the overall population and specified subgroups.
Usage:
plotKM.band_subgroups(
df, tte.name, event.name, treat.name, weight.name = NULL,
sg_labels = NULL, ltype = "s", lty = 1, draws = 20, lwd = 2,
sg_colors = NULL, color = "lightgrey", ymax.pad = 0.01, ymin.pad = -0.01,
taus = c(-Inf, Inf), yseq_length = 5, cex_Yaxis = 0.8, risk_cex = 0.8,
by.risk = 6, risk.add = NULL, xmax = NULL, ymin = NULL, ymax = NULL, ymin.del = 0.035,
y.risk1 = NULL, y.risk2 = NULL, ymin2 = NULL, risk_offset = NULL, risk.pad = 0.01,
risk_delta = 0.0275, tau_add = NULL, time.zero.pad = 0, time.zero.label = 0.0,
xlabel = NULL, ylabel = NULL, Maxtau = NULL, seedstart = 8316951,
ylim = NULL, draws.band = 20, qtau = 0.025, show_resamples = FALSE
)
Arguments:
- df: Data frame containing survival data. -
tte.name, event.name, treat.name,
weight.name: Column names for time-to-event, event
indicator, treatment group, and weights. - sg_labels:
Character vector of subgroup definitions (as logical expressions). -
sg_colors: Colors for subgroup curves. -
ltype, lty, lwd: Line type,
style, and width for curves. - draws,
draws.band: Number of draws for resampling and simultaneous
bands. - color: Color for confidence band polygon. -
ymax.pad, ymin.pad, ylim,
ymin, ymax: Plot y-axis padding and limits. -
risk_cex, cex_Yaxis: Text size for risk table
and axis. - by.risk, risk.add,
risk_delta, risk_offset,
risk.pad: Risk table settings. - xlabel,
ylabel: Axis labels. - Maxtau,
tau_add, taus: Time truncation settings. -
show_resamples: Logical; whether to plot resampled curves.
- …and other graphical parameters.
Value:
(Invisible) list containing: - fit_itt: KM_diff results for
the full population. - xpoints: Time points used. -
Dhat_subgroups: Difference curves for subgroups. -
s0_subgroups, s1_subgroups: Survival curves
for control and treatment in subgroups. - rpoints: Risk
table time points. - Risk_subgroups: Risk table counts for
subgroups. - mean, lower, upper:
Main difference curve and confidence intervals.
Details:
- Uses KM_diff to compute survival differences and
confidence intervals. - Plots the main difference curve, confidence
bands, and subgroup curves. - Displays risk tables for the overall
population and subgroups. - Handles simultaneous confidence bands if
draws.band > 0.
Example:
plotKM.band_subgroups(
df = mydata,
tte.name = "time",
event.name = "status",
treat.name = "group",
weight.name = "weights",
sg_labels = c("age > 65", "sex == 'F'"),
sg_colors = c("red", "blue"),
draws.band = 1000
)
Key Features: - Calculates the weighted log-rank
statistic and its variance. - Computes the difference in survival at a
user-specified time (tzero). - Estimates the variance of
the survival difference. - Computes the covariance and correlation
between the log-rank statistic and the survival difference. - Supports
multiple weighting schemes (Fleming-Harrington, Schemper, Xu &
O’Quigley, Maggir-Burman, custom time-based, and custom codes) via the
scheme argument.
Typical Inputs: - dfcounting: List from
df_counting containing risk sets, event counts, and
survival estimates. - rho, gamma: Weighting
parameters (for Fleming-Harrington and custom schemes). -
tzero: Time point for evaluating the survival difference. -
scheme: Weighting scheme (e.g., “fh”, “schemper”, “XO”,
“MB”, “custom_time”, “custom_code”). - Additional arguments for specific
schemes (e.g., Scensor, Ybar,
tpoints, t.tau, w0.tau,
w1.tau, mb_tstar).
Typical Outputs: - A list containing: -
lr: Weighted log-rank test statistic. -
sig2_lr: Variance of the log-rank statistic. -
dhat: Difference in survival at tzero. -
sig2_dhat: Variance of the survival difference. -
cov_wlr_dhat: Covariance between log-rank and survival
difference. - cor_wlr_dhat: Correlation between log-rank
and survival difference.
Main Steps: 1. Extracts risk sets, event counts, and
survival estimates from dfcounting. 2. Computes
time-dependent weights using wt.rg.S according to the
selected scheme. 3. Calculates the log-rank statistic, survival
difference at tzero, their variances, covariance, and
correlation. 4. Returns all results in a list for further analysis or
reporting.